arXivrl 504.06740V 1 [cs.CV] 25 Apr 2015 


SIFT Vs SURF: Quantifying the Variation in Transformations 

Siddharth Srivastava 

Department of Electrical Engineering, Indian Institute of Technology, Delhi 
eezl27506@iitd.ac.in 


Abstract —This paper studies the robustness of SIFT and SURF 
against different image transforms (rigid body, similarity, affine and 
projective) by quantitatively analyzing the variations in the extent 
of transformations. Previous studies have been comparing the two 
techniques on absolute transformations rather than the specific amount 
of deformation caused by the transformation. The paper establishes 
an exhaustive empirical analysis of such deformations and matching 
capability of SIFT and SURF with variations in matching parameters 
and the amount of tolerance. This is helpful in choosing the specific use 
case for applying these techniques. 

Index Terms —SIFT, SURF, Image Transformations, Image Classifica¬ 
tion 


I. Introduction 

Natural images may suffer from many deformations like rotation, 
scale, shear, viewpoint etc. Geometric transformations are used to 
explain these deformations on images. There are primarily four cate¬ 
gories transformations, namely Rigid Body transformation. Similarity 
Transformation, Affine Transformation and Perspective Transforma¬ 
tion, with perspective transformation being the most general of the 
four. Since these transformations are usually very common in real 
world images, it becomes important to be able analyze the images 
while minimizing the deformations introduced while capturing them. 
This process is achieved by describing an image as a set of features 
which uniquely identifies the image or acts as fingerprint for the 
image. Many techniques have been proposed and compared with each 
other for this purpose (T], ||2l, |^. SIFT IT) and SURF Q are two 
such widely used feature detection techniques. 

The primary aim of this paper has been to perform various trans¬ 
formations on datasets of images and study the matching capability of 
SIFT and SURF features. The paper is organized as follows. Section 
[n| describes the methodology used in this paper with regard to the 
transformations considered, the feature extraction techniques and the 
matching algorithm. This is followed by a discussion of the results 
obtained in Section followed by the conclusion in Section |IV] 

II. Methodology 

This section discusses the methodology adopted in our work. 
We begin by presenting a discussion of the dataset used and the 
motivation for choosing the dataset. Subsequently, we detail the trans¬ 
formations considered as applied to the dataset. We then present a 
brief discussion of the feature extraction techniques and the matching 
algorithm used. 

A reference dataset consisting of 10 images from the Oxford 
buildings dataset has been chosen. The size of the images in the 
original dataset is either 1024x768 or 768x1024. For computational 
efficiency the images have been scaled down by 50% along both the 
dimensions. The images have been chosen to test the SIFT and SURF 
with differing category of content in the images. While the dataset 
consists of the buildings, each image has been chosen keeping certain 
parameters in mind. Fig. |l(a)| has a lot of fine details. Fig |l(b)| and 
Fig 1 1(c) I are of the same building under different lighting and viewing 


conditions. Fig |l(c)| is the front view of a normal building and Fig |l(d)| 
has textural details. The reason for choosing such images has been 
to incorporate the above mentioned factors for testing the robustness 
of the feature matching techniques against the transformations as 
discussed in previous section. 




Fig. 1. The dataset used for the study (derived from Oxford Buildings 
Dataset) 


A. Image Transformations 

Following transformations were applied to the images to generate 
a cumulative dataset for testing. 

1) Scaling: The original images were scaled in the following ratios 
with respect to the reference images: 0.125, 0.25, 0.5, 0.75, 2 
and 4. 

2) Rotation: The original image has been rotated with the follow¬ 
ing angles in anticlockwise direction (degrees): 10, 20, 30, 40, 
50, 90, and 180. 

3) Similarity Transform: On the rotation images generated previ¬ 
ously, scaling was applied in the following ratio: 0.25, 0.5, 2 
and 4. This resulted in 28 similarity transform images 

4) Affine Transform: Each reference image was transformed with 
5 different affine transformations. First, an affine transform was 
obtained transforming the top left, top right and bottom left 
comers of the reference image to different locations in the target 
image. The obtained affine transform matrix was then applied 
to the entire image to obtain the affine transformed image. 
The transformations applied are shown in Fig[^ Fig |2(a)| is the 
reference image with red, blue and green patches indicating 
the comers considered for getting the transform matrix. The 
corresponding corners are also shown in the affine transformed 
images in Fig |2(b)| - Fig |2(f)| As can be seen from the 
transformed images, the parallel lines from the reference image 
are preserved in all the transformed images. 












5) Perspective Transform: Each reference image was transformed 
with 5 different perspective transformation matrices. Though 
affine transform is a special case of perspective transform, to 
study the effect of both affine and perspective transformations 
quantitatively ,the three points considered in affine transforma¬ 
tion were kept the same in the perspective transformation but 
the fourth point (bottom right comer of the image) was varied 
in these transformations. 

Fig |3(a)| is the reference image indicating the comer points 
with colored patches. Comparing it with Fig we can see 
that Fig |3(b)| |3(c)| and |3(e)| correspond to Fig p(b)| |2(c)| and 
|2(e)| respectively. In these cases, perspective transform was 
obtained by keeping the transformed location of the fourth 
corner point aligned proportionally with the three corners of 
the affine transformation. This shows that affine transform is 
indeed a specific case of perspective transform. 

Another point observed is that the perspective transform only 
preserves the straight lines. As can be seen from Fig |3(d)| and 
Fig |3(f)| the parallelism among the lines has been lost. 



(a) (b) (c) 




(d) (e) (f) 



Fig. 4. Bending the line. 



Fig. 5. Bending the line. 


Fig. 2. Affine Transformations applied to the images. 


recover Fig |3(a)| by applying the inverse perspective transform on 
Fig failed. 



An attempt was to made to deform straight lines by applying a 
transform which bends the line joining the top left and top right 
corner along the center of the line as shown in Fig|^ The transformed 
image is shown in the Fig|^ 

As can been seen from Fig the transformed image has visual 
loss in terms of intensity changes as well as the deformation is not 
at all close to the expected deformation of Fig The attempt to 


B. Feature Extraction 

This section discusses the feature extraction algorithms used. 

1) Scale Invariant Feature Transform (SIFT): The SIFT algorithm 
is described in brief as follows: 

1) SIFT applies Gaussian filter to the image at various scales 
which are called octaves. Each octave is a collection of suc¬ 
cessively blurred images. Octaves differ with each other in the 
scale (usually 1/2 of previous octave). This is called scale space 
analysis. In second step, it calculates Difference of Gaussian 
(DoG) from successively blurred image which provides it scale 
invariance. 

2) For finding the keypoints, it finds maxima and minima in DoG 
images and then finds sub-pixel minima and maxima from them 
using Taylor’s series expansion. 

3) Next, the erroneous key-points are eliminated by thresholding. 
So, it aims at finding the corner points for stronger keypoints. 

4) Then the orientation is assigned to each keypoint within a 
region depending upon the scale of the image. Since the 
orientation of each sub region is adjusted against the orientation 
of the keypoint’s region (by subtraction), rotation invariance is 
achieved. 

5) Feature estimation: By considering a 16x16 region around 
it and the orientation is calculated for each 4x4 region in 


















it. A histogram is plotted with 8 bins but the assigned bin 
for each orientation is dependent upon the distance of the 
region from the key-point. This is achieved with the help of 
a Gaussian weighted function which also provides robustness 
to deformations and translation. Since there are 4x4 regions 
and 8 bins, SIFT calculates 128 dimensional feature vector. 

2) Speeded Up Robust Features (SURF): SURF is also a feature 
extraction technique which claims to be more robust and faster than 
SIFT. The algorithm highlighting the key difference from the SIFT 
as described above are described below: 

1) SURF uses Integral images for speeding up the calculations. 

2) Though SURF also creates octaves but it doesn’t scales down 
the image, instead it changes the size of the box filter. (Scale 
Invariance) 

3) Finding Keypoints: It uses Hessian Determinant for this pur¬ 
pose, which helps in expressing the local changes. 

4) Then the Haar Wavelet responses are calculated again depend¬ 
ing upon the scale similar to SIFT. 

5) In the step above, each 4x4 sub region gives 4 values (Haar 
wavelet response), hence SURF calculates 64 dimensional 
feature vector. 


C. Feature Matching 

The SIFT and SURF descriptors were matched using the FAST 
library for Approximate Nearest Neighbors (FLANN) 0 . 

D. Matching Accuracy and False Positives 

Matching accuracy is calculated by the following formula: 


. ^ .FalsePositives 

Accuracy = 1 — {—---;- 

^ Total Matches 


* 100 ) 


( 1 ) 


where false positives are the number of erroneously matched key- 
points. 

False Positive is calculated by projecting the matched keypoints 
from reference image to the transformed image. 


HI. Results 

The implementation was done using OpenCV 2.4.6 with Qt 5.0.2. 
The OpenCV implementation of SIFT, SURF and FLANN are used 
for obtaining results. Additional Parameters for result generation: 

1) Nearest neighbor distance: The minimum distance between 
descriptors was varied for matching as t * miridistance where 
t = {2,5,10}. 

2) False Positives: Two neighbourhood sizes of 3x3 and 9x9 were 
used for marking a match as false positives. 

Each result considered in the following section has been obtained 
by averaging the results for individual images for corresponding 
matches. 


A. Scaling 

The effect of scaling on the classification accuracy is shown in 
Fig|^ As shown in the plot, as the minimum distance increases, the 
matching accuracy usually decreases. The reason for this is that when 
the threshold increases, more keypoints would be matched, but it also 
results in increase of false positives due to greater threshold. 

The plot also indicates that SIFT is more robust to scale changes 
than SURF indicated by the higher and consistent matching accuracy. 
It is also indicated from the plot of SURF (tl, nl) that SURF is more 
stable at lower scales than higher scales. 

It was also expected that as the size of the neighborhood (3x3 
to 9x9) for finding false positives increases, the matching accuracy 


SIFT vs SURF (Scale Changes) 
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tl = 2*min_distance, t2 = 5*min_distance 
nl = 3x3, n2 = 9x9 
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Fig. 6. Effect of scaling. 


should have increased which is also evident from the plot. For 
example, SURF (t2, n2) has uniformly higher accuracy than SURF 
(t2,nl). Hence, using two different neighborhoods wouldnt be very 
significant in deciding the robustness of the techniques. Hence, rest 
of the results would consider only one neighborhood of 3x3 for 
comparing results. 

B. Rotation 

The plot for rotation shown in Fig compares the robustness at 
10, 20, 30, 40, 50, 90 and 180 degrees respectively. 


SIFTvsSURF(Rotation) 



■ SIFT(tl) BSURFftl) ■SIFT(t2) ■SURF(t2) BSIFTltB) BSURFftS) 


Fig. 7. Effect of rotation. 

As indicated by the plot, SIFT outperforms SURF in consistency 
of the matches at various angles. It is shown that SURFs rotation 
invariance decreases as the angle of deformation increases. But at 
90 degrees and 180 degrees, the plot shows that SURF performs 
comparable matching efficiency to SIFT. This anomaly can be at¬ 
tributed to the type of images in the dataset. Since the images are 
of buildings consisting mostly of perpendicular and horizontal lines 
and the orientation of corners being symmetrical at doors, windows 
and edges of the building, the matching at 90 and 180 degrees finds 
the mostly the same keypoints which have stronger correspondence 
with the reference image than other orientations. 

C. Affine Transform 

The plot of affine transform (Fig shows the matching accuracy 
of SIFT and SURF with different affine transforms applied to the 
image and as shown in Fig[^ where A1 corresponds to Fig |2(b)| and 
so on. 

The plot indicates that SIFT outperforms SURF on invariance 
to Affine Transformation. For A1 to A3, SURF is pretty close to 
SIFT. This is also owed to the fact that A1 and A3 only have 10 
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SIFT vs SURF (Affine Transformation) 
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SIFT vs SURF(Similarity Transform) 
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Fig. 8. Effect of Affine Transformations. 


Fig. 10. Effect of Similarity Transformations. 


to 30% translation of the corner along both the axes, while A2 is 
essentially a scaled down version of reference image. For A4 and 
A5, SIFTs matching accuracy is much higher than SURF but is still 
not very accurate. Even by increasing the threshold (tl, t2, t3), the 
matching accuracy does not see any relative improvement as already 
showed and discussed in section for scaling. A5 is the strongest 
affine transform, and the matching accuracy drops considerably when 
compared to other affine transforms. Hence, it can be said, that 
SIFT is invariant to only mild affine transformations i.e. which are 
essentially rotation or scale change across the axes. 

D. Perspective Transform 

In Fig 1^ PI to P5 correspond to Fig 3(b-f) respectively. For PI 
and P2, SIFT and SURF have comparable accuracies, owing to the 
fact that these are essentially translated and scaled down version of 
the reference image. 

SIFT vs SURF (Perspective Transform) 
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Fig. 9. Effect of Perspective Transformations. 

For P4, SIFT has average matching accuracy, but as compared to 
P3 and P5, it has more restrictive transform. SIFT and SURF have 
very poor matching accuracy which can be explained from the fact 
that, these images correspond to more complex perspective transform 
as compared to others. Hence, it can be concluded that SIFT and 
SURF, both have extremely poor invariance to perspective transform 
when the viewpoint change is large while they have average matching 
accuracy in case of mild viewpoint change. 

E. Similarity Transform 

SIFT and SURF both have comparable matching capabilities for 
lower angles and scales while SIFT outperforming SURF for others. 
These results follow directly from discussion on Scale and rotation 
in-variances discussed above. 


IV. Conclusion 

Matching performance of SURF and SIFT were compared for 
trends in various transformations and anomalies arising in the re¬ 
sults were analyzed. It was found that SIFT outperforms SURF on 
almost all occasions while they both perform poorly for perspective 
transformation while partly being stable towards affine transforma¬ 
tions. Existing studies on their comparison took only one parameter 
in consideration for concluding results. The performance of SIET 
and SURE on specific scales and effect on them by increasing or 
decreasing scales was demonstrated, establishing that though SIET 
and SURE both are invariant at lower scales, SIET outperforms SURE 
on higher scales. Similar pattern was observed for rotation changes. 
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